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Abstract. Motivated by a problem in learning theory, we are led 
to study the dominant eigenvalue of a class of random matrices. 
This turns out to be related to the roots of the derivative of random 
polynomials (generated by picking their roots uniformly at random 
in the interval [0, 1], although our results extend to other distribu- 
tions). This, in turn, requires the study of the statistical behavior 
of the harmonic mean of random variables as above, and that, in 
turn, leads us to delicate question of the rate of convergence to 
stable laws and tail estimates for stable laws. 



Introduction 

The original motivation for the work in this paper was provided 
by the first-named author's research in learning theory, specifically 
in various models of language acquisition (see |[KNN200T| , [NKN20JT| , 



KN2001|| ) and more specifically yet by the analysis of the speed of con- 



vergence of the memoryless learner algorithm. The setup is described 



in some detail in Section |4J]; here we will just recall the essentials. 
There is a collection of concepts i?i,...,i? n and words which refer 
to these concepts, sometimes ambiguously. The teacher generates a 
stream of words, referring to the concept R±. This is not known to the 
student, but he must learn by, at each steps, guessing some concept 
Ri and checking for consistency with the teacher's input. The mem- 
oryless learner algorithm consists of picking a concept Ri at random, 
and sticking by this choice, until it is proven wrong. At this point 
another concept is picked randomly, and the procedure repeats. It is 
clear that once the student hits on the right answer Ri, this will be his 
final answer, so the question is then: 
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How quickly does this method converge to the truth? 

Since the method is memoryless, as the name implies, it is clear that 
the learning process is a Markov chain, and as is well-known the conver- 
gence rate is determined by the gap between the top (Perron- Frobenius) 
eigenvalue and the second largest eigenvalue. However, we are also in- 
terested in a kind of a generic behavior, so we assume that the sizes of 
overlaps between concepts are random, with some (sufficiently regular) 
probability density function supported in the interval [0,1], and that 
the number of concepts is large. This makes the transition matrix ran- 
dom (that is, the entries are random variables) - the precise model is 
described in Section [4.1| . The analysis of convergence speed then comes 
down to a detailed analysis of the size of the second largest eigenvalue 
and also of the properties of the eigenspace decomposition (the con- 
tents of Section [4.3| .) Our main results for the original problem (which 
is presented in Section 4.4 ) can be summarized in the following []: 



Theorem A. [ ]4.9| , |4.10[ | Let be the number of steps it takes for 
the student to have probability 1 — A of learning the concept. Then we 
have the following estimates for 

• if the distribution of overlaps is uniform, or more generally, the 
density function /(l — x) at has the form f(x) = c + 0(x 5 ), 
5, c > 0, then there exist positive constants C\, such that 

Inn P ( d < - ^ < C 2 

| log A\nlogn 



n— >oo 



if the probability density function f{l — x) is asymptotic to cx 13 + 
0(x /3+<5 ), 5, (3 > 0, as x approaches 0, then we have 

lim P (c[ < M Na ai <C')=1 

n^oo y | log A 1 77, J 

for some positive constants C[ and C 2 , 

if the asymptotic behavior is as above, but — 1 < (3 < 0, then 

limP ^ < |WATnV(^) < ^ =1 - 



x->co \ x 



It should be said that our methods give quite precise estimates on 
the constants in the asymptotic estimate, but the rate of convergence 
is rather poor - logarithmic - so these precise bounds are of limited 
practical importance. 



^ere and throughout this section, the reference to the relevant theorem (lemma) 
in the main body of the paper is given in square brackets 
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Notation. We shall use the notation a x b to mean that a is asymp- 
totically the same as b. We say that a ~ b if a and 6 have the same 
order of growth (in other words, there exist constants Ci, c 2 , d±, d 2 , 
with ci, C2 > 0, so that cia + di < b < c 2 a + d 2 .) In addition we denote 
the expectation of a random variable x by E(x). 

Eigenvalues and polynomials. In order to calculate the convergence 
rate of the learning algorithm described above, we need to study the 
spectrum of a class of random matrices. The matrix T = (T^-) is an 



n x n matrix with entries (see Section 4.1 



T 




otherwise. 



Let B = 2i — ±(J — T), so that the eigenvalues of T, Aj, are related to the 



eigenvalues of B, fii, by A^ = 1 — n/(n — In Section [12] we show 
the following result: 



Lemma B. ||4.7|| Let p(x) — (x — x{) ... (a; — x n ), where Xi — 1 — a^. 
Then the characteristic polynomial pb of B satisfies: 

, , x dp(x) 

Pb{x) = — . 

n ax 

Lemma |B| brings us to the following question: 

Question 1: Given a random polynomial p(x) whose roots 
are all real, and distributed in a prescribed way, what can 
we say about the distribution of the roots of the derivative 
p'{x)l 

And more specifically, since the convergence behavior of T N is con- 
trolled by the top of the spectrum: 

Question 1': What can we say about the distribution of the 
smallest root of p'(x), given that the smallest root of p(x) is 
fixed? 

For Question 1' we shall clamp the smallest root of p(x) at 0. Letting 
if n _i be the harmonic mean of the other roots of p(x) (which are all 
greater than zero with probability 1), our first observation will be 



Lemma C. ||3.3|| The smallest root /i* of p'(x) satisfies: 

We will henceforth assume that the roots of the polynomial p(x) are 
a sample of size degp(x) of a random variable, x, distributed in the 
interval [0, 1]. In this stochastic setting, it will be shown that {n — 
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tends to the harmonic mean of the non-zero roots of p with probability 
1, when n is large. It then follows that the study of the distribution of 
/z* entails the study of the asymptotic behavior of the harmonic mean 
of a sample drawn from a distribution on [0, 1] . 

Statistics of the harmonic mean. In view of the long and honorable 
history of the harmonic mean, it seems surprising that its limiting 
behavior has not been studied more extensively than it has. Such, 
however, does appear to be the case. It should also be noted that 
the arithmetic, harmonic, and geometric means are examples of the 
"conjugate means" , given by 



/ 1 - 

mjr(x 1 , ...,x n ) = T~ x - 2j T{x 



n 
i=i 



where F{x) = x for the arithmetic mean, J-'(x) = log(x) for the geo- 
metric mean, and JF(x) = 1/x for the harmonic mean. The interesting 
situation is when T has a singularity in the support of the distribution 
of x, and this case seems to have been studied very little, if at all. Here 
we will devote ourselves to the study of harmonic mean. 

Given x%, . . . , x n - a sequence of independent, identically distributed 
in [0, 1] random variables (with common probability density function 
/), the nonlinear nature of the harmonic mean leads us to consider the 
random variable 



X, 



1 n 1 



n c — ' Xi 

i=l 



Since the variables are easily seen to have infinite expectation and 
variance, our prospects seem grim at first blush, but then we notice 
that the variable 1/xt falls straight into the framework of the "stable 
laws" of Levy - Khintchine ( ||Feller V2[| ) . Stable laws are defined and 



discussed in Section 1.2. Which particular stable law comes up depends 



on the distribution function f{x). If we assume that 

f(x) x cx , 

as x — > (for the uniform distribution {3 = 0, c = 1), we have 

Theorem D. If (3 = ; then let Y n = X n — logn. The variables Y n 
converge in distribution to the unbalanced stable law G with exponent 
a = 1. If (3 > 0, then X n converges in distribution to 5(x — £), 
where £ = E(l/x), and 6 denotes the Dirac delta function. If —1 < 
(3 <0, then n 1_1 /( 1+ ^X n converges in distribution to a stable law with 
exponent a = 1 + (3. 
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The above result points us in the right direction, since it allows us 
to guess the form of the following results (H n is the harmonic mean of 
the variables): 

Theorem E. [|TT2| , |TT3H Let H n = l/X n and (3 = 0. Then there exists 
a constant £i such that 

lim E(H n \ogn) = €%. 



Theorem F. ||2.1|| Suppose (3 > 0, let y = 1/x, and let £ be the mean 
of the variable y. Then 

]im nr + ao E(£H n ) = 1. 

Finally, 

Theorem G. [ |2.4| | Suppose (3 < 0. Then there exists a constant £2 
such that 

EiHjn 1 - 1 '^) = U 



Theorem H ({1~4, 2.2, Law of large numbers for harmonic mean) . Let (3 
and let a > 0. Then 

lim P(\H n \ogn - £ x | > a) = 0, 

n— >oo 

where €1 is as in the statement of Theorem If (3 > 0, and £ is as 
in the statement of Theorem then 



lim P(\H n 



> a) 



0. 



The proofs of the results for /3 — require estimates of the speed of 
convergence in Theorem 0. The speed of convergence results we obtain 
(in Section |B|) are not best possible, but the arguments are simple and 
general. The estimates can be summarized as follows: 



Theorem I. |[B.1|| Assume (3 = 0. Let g n be the density associated to 
X n — \ogn, and let g be the probability density of the unbalanced stable 
law with exponent a = 1. Then we have (uniformly in x): 

g n (x) = g(x) + O (log 2 n/ n) . 

In addition to the laws of large numbers we have the following lim- 
iting distribution results: 



Theorem J. ||1.5| , |1.6|| For a = 1, the random variable \ogn(H n logn— 
converges to a variable with the distribution function 1 — G( — x/C 2 ), 
where G is the limiting distribution (of exponent a = 1 ) of variables 
Y n = X n — clogn and £1 = 1/c. 
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Theorem K. pT3[ For a > 1, the random variable n 1 1 ^ a (H n — \ 



converges in distribution to a variable with distribution function 1 
G(—xS 2 ), where G is the unbalanced stable distribution of exponent a. 



Theorem L. ||2.15|| For < a < 1, the random variable B. n jn x 



converges in distribution to the variable with distribution function 1 — 
G(l/x), where G is the unbalanced stable distribution of exponent a. 

The paper is organized as follows. In Section [l] we study some sta- 
tistical properties of a harmonic mean of n variables and in particular, 
find the expected value of its mean as n — > oo. In Section |3| we explore 
the connection between the harmonic mean and the smallest root of the 
derivative of certain random polynomials. In Section |] we uncover the 
connection between the rate of convergence of the memoryless learner 
algorithm, eigenvalues of certain stochastic matrices and the harmonic 
mean. The more technical material can be found in the Appendix. 
In Section |A] of the Appendix we present an explicit derivation of the 
stable law for a particular example with a = 1. In Section [B| we eval- 
uate the rate of convergence of the distribution of the inverse of the 
harmonic mean to its stable law. 

1. Harmonic mean 

1.1. Preliminaries. Let x\, x n be positive real numbers. The har- 
monic mean, H n , is defined by 

w - = -(t-)- 

Let Xi, . . . ,x n be independent random variables, identically uniformly 
distributed in [0, 1]. We will study statistical properties of their har- 
monic mean, H n , with emphasis on limiting behavior as n becomes 
large. 

We will use auxiliary variables X n and Y n , defined as 
1 n 1 1 

(2) Xn = -J2- = JT > ^ = A n -logn, 

i=l 

and also variables y% = —. The distribution of i/i is easily seen to be 
given by 



(3) d{z)=P( yi <z) 



z < 1 

1 — - otherwise. 



A quick check reveals that yi has infinite mean and variance, so the 
Central Limit Theorem is not much help in the study of X n . Luckily, 
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however, X n converges to a stable law, as we shall see. A very brief 
introduction to stable laws is given in the next section. 

1.2. Stable limit laws. Consider an infinite sequence of independent 
identically distributed random variables yi, . . . ,y n , . . . , with some prob- 
ability distribution function, Typical questions studied in probabil- 
ity theory are the following. 

Let S n = YTj=i Vy How is S n distributed? What can we say 
about the distribution of S n as n — > oo? 

The best known example is one covered by the Central Limit Theorem 
of de Moivre - Laplace: if $ has finite mean 8 and variance a 2 , then 
(S n — nS)/{^/na) converges in distribution to the normal distribution 
(see, e.g., ||FellerV2| ). In view of this result, one says that the variable 
X belongs to the domain of attraction of a non-singular distribution 
G } if there are constants and b\, . . . ,b n , . . . such that 

the sequence of variables Y n = a n S n — b n converges in distribution to 
G. It was shown by Levy and by Khintchine that having a domain 
of attraction constitutes severe restrictions on the distribution as well 
as the norming sequences {a n } and {b n }. To wit, one can always pick 
a n = n~~ l / a , < a < 2. It turns out that a is determined by the 
limiting behavior of the distribution $ so that 

(4, Urn = \ CP > 1 > ° 

M-*oo dx yCq, x < 0, 

where p + q = 1. In that case, G is called a stable distribution of 
exponent a. Note that the case when a = 2 corresponds to the Central 
Limit Theorem. If the variable y belongs to the domain of attraction of 
a stable distribution of exponent a > 1, then y has a finite expectation 
£; just as in the case a = 2, we can choose b n = n x ~ x l a E. When a < 1, 
the variable y does not have a finite expectation, and it turns out that 
we can take b n = 0; for a = 1, we can take b n = clogn, where c is 
a constant depending on 5- Thus, the normal distribution is a stable 
distribution of exponent 2 (and it is also unique, up to scale and shift). 
This is one of the few cases where we have an explicit expression for the 
density of a stable distribution; in other cases we only have expressions 
for their characteristic functions. The characteristic function \E'(A;) of a 
distribution function G(x) is defined to be exp(ikx)dG(x), that is, 
as the Fourier transform of the density function. Levy and Khinchine 
showed that the characteristic functions of stable distributions can be 
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parameterized as follows: 
(5) 

log*(fc) = 



a(a— 1) 

C [^7r — sign ki{p — q) log \k\\ \k\ + const, a = 1, 
where the constants C, p and g can be defined by the following limits: 

(6) lim 1 / ~^ X \ r = Cp, 

(7) lim = Cq, 

x^oo 1 — $(x) 

and p + q = 1; the quantities p, q and C here are the same as in formula 
(^). We will say that the stable law is unbalanced if p = 1 or q = 1 
above. This will happen if the support of the variable y is positive - 
this will be the only case we will consider in the sequel. 

If x{k) is the characteristic function of our variable y, then the char- 
acteristic function of the stable distribution, ^(k), satisfies 

(8) = lim 

n— >oo 

where 

(9) ^ n = exp(-ib n k) X n (a n k). 

Notation. Throughout the paper we will use the notation G n for the 
distribution function of the random variable Y n and G for the corre- 
sponding stable distribution; g n for the density of Y n and g for the 
stable density; for the characteristic function of G n and ^ for the 
characteristic function of the stable distribution. 

1.3. Limiting distribution of the harmonic mean H n for a = 1. 

Let us go back to the example of Section |1 . 1| , where the random vari- 
ables Xi were uniformly distributed in [0, 1]. We will study the limiting 
behavior of the distribution of quantities related to S n = J2j=i ^/ x j- 

The distribution function of the variables yi = 1/xi is given by ([|), 
which implies p — 1 and q = 0, see formulas @ and (|7p. From the 
behavior of the tails of the distribution $ we see that a = 1, so the 
norming sequence should be taken a n = 1/n, b n = logn. Then the 
distribution G n of the variable Y n (given by equation (0)) converges to 
a stable distribution G. The explicit form of the corresponding stable 
density, g, can be obtained by taking the Fourier transform of the 
characteristic function \1/ in formula (|5|) (see also ||FellerV2| , Chapter 
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XVII]). A direct derivation of formula (0) is given in Appendix 0: 

/>oo 

(10) g(y) = 1/(2tt) / e - 4fcj/ e~ |fck/2 - 4fc(log|fc| - 1+7) dfc, 



oo 



where 7 is Euler's constant. 

Remark 1.1. Results of this section can be easily generalized to any 
density f of the random variable x which satisfies lim x ^ f( x ) > 0. For 
any such distribution we obtain a stable law with exponent a = 1. 

Next, let us analyze the limiting behavior of the harmonic mean, H n . 
To begin we will compute the behavior of the mean of H n , 

(11) E(H n ) x ^ — -i dG n . 

J_ 00 x + logn 

It turns out that for this, we do not need the explicit form of the stable 
distribution G of Y n ; it is enough to use the following information about 
the behavior of the tails: 

(12) lim xG(x) = 0, lim x(l - G(x)) = 1. 

X— > — OO X'— »oo 

These equations can be obtained from (g) and (|7|), see XVII. 5 | |l'ellerV2[ . 
The exact asymptotics of the tails are computed in ||lbLinl97l| , Chapter 

2]- 

Let us pick a large cutoff c n ; we take c n to tend to 00, but in such a 

c n = ylogn, and rewrite equation ([LTD as 

h(c n ) + h{c n ) + h(cn), 



— — dG n 

oo x + logn 

Cn 

dG n , 



way that c n 


= o(logn), e.g. 


(13) 




where 




(14) 


A(c„) 


(15) 




(16) 


^3(C„) 



. n x + log n 
1 

x + log n 



dG n . 



We estimate these integrals separately, using equation \ JjL2j) , the ob- 
servation that G n (c n ) = for c n < 1 — logn and the estimate on the 
convergence speed of G n to G as obtained in Section [B|. Since we are in- 
tegrating over an interval of length bounded by a constant times log n, 
it is more than sufficient for the speed of convergence to the stable 
density to be of order log 2 n/n, see Theorem [B.l| . Integrating by parts, 
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we obtain 
(17) 

- n — t — dc= G{ ~ c ; ] + r n , g \ x) ^ dx. 

Ji-iogn % + logn -c n + logn A-iogn [x + logny 

The first term is seen to be o ( — -, — ^ c ) , so given our choice of c n , 

it is o(l/ logn). The integral in the right hand side of equation ([T7| ) is 
asymptotically (in c n ) smaller than 

-dx = O 



'i-iogn ~x(x + \ogn) 2 ' \\ogn / 

and therefore, I\ = 0(1/ logn). To show that ^3 = 0(1/ logn), we note 
that the integrand is dominated by 1/ logn, while lim^^oo f c °° dG n = 0. 
For I2 we have the trivial estimate (since l/(x + log n) is monotonic for 
x > — log n) : 

/ 1Q N ^(Cn) ~ G(-C n ) G( Cn ) - Gj-Cn) 

logn + c n logn-c n 

from which it follows that lim^oo I2 logn = 1. To summarize, we have 
shown 

Theorem 1.2. For the variable x uniformly distributed in [0, 1], lim^oc E(log nH n ) 
1. 

Remark 1.3. For a general density of x satisfying \im. x ^f(x) > 0, 
wehaveY n = 1/H n — clogn, and Theorem |ITI} generalizes to lim^oo E(log nH n ) = 

d = i/c. 

In addition, we have the following weak law of large numbers for H n : 
Theorem 1.4. For any e> 0, lim^oo P ( | H„ log n — 1| > e) = 0. 
Proof. Note that 

logn 

while 

logn. 
1 -J' 

Both probabilities decrease roughly as l/(elogn) using the estimates 
(0). □ 

The above weak law indicates that if we are to hope for a limiting 
distribution for H n , we need to normalize it differently than by mul- 
tiplying by logn. An examination of the argument above shows that 



P(# n logn-l>e) = P(X n < 
P(# n logn-l< -e)=P(X n > 
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the appropriate normalization is H n log 2 n — log n. Indeed, we have the 
following 

Theorem 1.5. The distributions of the random variable H n log 2 n — 
\ogn converges to the variable with distribution function 1 — G(—x), 
where G is the limiting (stable) distribution (of exponent a = I) of 
variables Y n = X n — logn. 

Proof. The proof is quite simple. Indeed, since 

1 



Y, + log n 



we write 



P(H n \og 2 n-\ogn<a) =P(^-logn<a) 

Y n +logn ^ a >- 



Since Y n + logn > 0, we can continue: 

-Y logn < = p ( ^ <y;) ^p (y;> _ a) ^ 1 _ G( _ a)! 

Y n + lOgn a+iogrt 

where we have assumed that n is large enough that a + logn > 0. □ 



Remark 1.6. For a general density of x satisfying lim x ^of(x) > 0, 
Theorem \l.b\ can be generalized in the following way: the random vari- 
able \ogn(H n \ogn — <L\) converges in distribution to a variable with 
distribution function 1 — G{— x/€ 2 ), where G is the limiting distribu- 
tion (of exponent a = 1 ) of variables Y n = X n — clogn and €i = 1/c. 



Theorem |1.5| could be viewed as a kind of an extension of Zolotarev's 
identity (see |[FcllcrV2| , Chapter XVII, Section 6] and |[IbLinl97T| , The- 
orem 2.3.4]): 

Let a > 1. Then the density p(x;a) of the unbalanced 
stable law satisfies 

(20) xp(x; a) = x~ a p(x~ a ; —). 

a 

2. Limiting distribution of the harmonic mean H n for 

a ^ 1 

Let us consider other types of the distribution of the variable x and 
study the limiting behavior of the corresponding harmonic mean. If 
the density / of the variable x behaves as 

(21) f{x)~x p 
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near x — 0, then we have for the density of y = 1/x: d^(y)/dy ~ 
|y|-(/5+ 2 ) as 1 2/ 1 — > oo, which gives a = (3 + 1 as the exponent of the 
stable law. Using the material of Section |L2] and the definition of H n , 
we obtain: 

,xl-V° 

(22) y n 



< a < 1, 



n ^ 1,a \-k- £ ) a>1 

(here £ = E(y)). 

2.1. The case > 0. 

Theorem 2.1. If (3 > 0, then lim n ^ 00 E(if„) = l/£ . 
Proof. If /3 > (i.e. a > 1), then we have 

lim E(H n ) = lim E ( -pL- + s) ' = lim [°° f° n - = ± 

□ 

There is also the following Weak Law of Large Numbers: 



Theorem 2.2. If (3 in equation ( \2j\ ) is positive, then 



limP(|ff n -i|>e) = 0. 

n— >oo o 



Proof. We have 

H _ - 



#n-^|>e) = P(|-^A^>e 



£ 2 < 



(n > ^-^fi- 



Since a = /3 + 1 > 1, then in the limit n — > oo this quantity tends to 
zero. □ 

In fact, we can use a manipulation akin to that in the proof of The- 
orem 



1.5 to show: 



Theorem 2.3. The random variable n 1_1//Q! (if n — j) converges in dis- 
tribution to a variable with distribution function 1 — G(—xS 2 ), where 
the distribution G is the unbalanced stable distribution of exponent a. 

Proof. 

P (n^(H n - I) < a) = P (V-V« (^JL^ _ i) < fl ) 

= P (f+y„n"^-i > ~ a£ , 
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The quantity £ + Y n n x l a ~ x is positive because Y n > n l ~ l l a (l — £), so 
we can write 

P (n — m 1 > ~ aS ) =P(Y n > 1 -°fL A ^P(Y n > -a£ 2 ) 

\£ +Y n n l / a ~ 1 ) V l+ofinv—iy v n y 

-> 1 - G(-a£ 2 ), 

where we have assumed that n is large enough that 1 + a£ n l l a ~ l > 
0. □ 

2.2. The case 1 < (3 < 0. 

Theorem 2.4. For — 1 < (3 < ; £/iere a constant £2 such that 

Proof. For — 1 < (3 < (or < a < 1) we would like to reason as 
follows: 

(23) lim E ( -^ r ) = hm E ( ±) = lim f °° d ° n f °° d ° 



n \-l/a J n -+oo \Y n J n-+ooJ_ OQ X J _ OG X 

Since the function 1/x is unbounded, the weak convergence of the 
distributions G n to the stable distribution G is not enough to justify 
the last step equality in the sequence (|23|) above. To justify it we need 
the following Lemmas: 

Lemma 2.5. Let yi,...,y n be positive independent identically dis- 
tributed random variables. Let S n = Y17=i Vi- Then, 

P(S n <a)<[P( yi <a)] n . 
Proof. Note that S n > maxi<j< n i/j. □ 
Now, in our case 

G n (a) = P (y^y* < an "^j < p (vi < an 
where the inequality follows from Lemma ^]5| (and recall that Xi = 

The probability P(xi > b) has the following properties: 

A) P(xi > b) = for b > 1, 

B) 1 - P(xi > b) ~ cb a , for b < 1, 

C) P(xi > 6) < 1 for b > 0. 

Lemma 2.6. G n (a) = for a < n~&. 

Proof. Follows from the definition of G n and Property A. □ 





n 




n 


r 




P ( x x > ) 








V an q / . 
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Lemma 2.7. There exists a b such that 1 — P(xi > b) < 2c'b a for all 
b < bo for some d > 0. 

Proof. This follows from Property B, with d = 2c. □ 



Lemma 2.8. If aria > l/b (bo as in the statement of Lemma \2. 7 ), 
then 

G n {a) < (1 - c'a-^n- 1 )" ~ exp^c'cr 1 ^). 



Proof. Follows from Lemma 2.5. □ 



Lemma 2.9. < [P(xi > b )} n . 



Proof. Follows immediately from Lemma |275|. □ 
Now we write: 

I" dG n _ I r-n'-k rn-i/bo rC rco\ dG n 

= I (n) + h{n) + I 2 {n) + I 3 (n). 

To analyze the above decomposition, we should first belabor the obvi- 
ous: 

Lemma 2.10. 

■ dG n G n (b) G n (a) + 



x b a ' J a x 2 

Proof. Integration by parts. □ 

Lemma 2.11. loin) = 0. 



Proof. The integrand vanishes in the interval by Lemma £T6|. □ 
Lemma 2.12. 

lim I\{n) = 0. 

n^oo 

Proof. By Lemma [2.9| , G n < [P(xi > bo)] n - The result follows by inte- 
gration by parts (Lemma p,10| ). □ 



Lemma 2.13. 



exp (^C a j rC exp (x 



lim J 2 (n) < ^ h / 5 — -dx. 

n-»oo C In X z 



Proof. Follows from Lemma |2.10| and Lemma |2.8| . □ 
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Lemma 2.14. 

lim h(n) = / — . 

7WOO J c X 

Proof. This follows from the weak convergence of G n to G. □ 



The derivation (p3|) is justified. Indeed, if we make the constant C 
above large, we see that the integral of dG n /x is bounded, hence so is 
the integral of dG/x. Convergence follows from the dominated conver- 
gence theorem (or by making C small). We have incidentally shown 
that the density of the stable law decays exponentially as x — ► + 
(exact expression can be found in |[lbLinl9TT] , Chapter 2]), □ 

Theorem 2.15. The quantity H n /n l ~ l l a converges in distribution to 
the variable with distribution function 1 — G(l/x), where G is the un- 
balanced stable law of exponent a. 

Proof. The proof is immediate. □ 

3. A CLASS OF RANDOM POLYNOMIALS 

Let xi, . . . ,x n be independent identically distributed random vari- 
ables with values between zero and one. Let us consider polynomials 
whose roots are located at X\, . . . , x n : 

n n—l 

(24) p(x) = Y[(x - Xi) = x n + 

i i=0 

Given the distribution of Xi, we would like to know the distribution law 
of the roots of the derivatives of p(x). 

3.1. Uniformly distributed roots. Let us denote the roots of = 
p'(x) by /ij, 1 < i < n — 1, and assume that fii < for all i. It is 
convenient to denote the smallest of Xj by mi, i.e. m\ = min^x-,-, the 
second smallest of Xj as m,2 and so forth. It is clear that 

(25) rrii < /Xj < m i+1 , 1 < i < n — 1. 

We now assume that the independently uniformly distributed 

in [0,1]. The distribution of nix is easy to compute: the probability 
that rrii > ct is simply the probability that all of the Xj are greater 
than a, which is to say, 

P(mi > a) = (l-a) n . 
Using this distribution function, one can show that 

E(m0 = 
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In fact, it is not hard to see that E(mj) = i/(n+ 1); the reader may 
wish to consult ||FcllcrV2| (page 34). We thus have: 



(26) — -—r < E(/ij) < —j-t, l<t<n-l. 
n + 1 n + 1 

In particular, for large values of n we have the estimate 

E(/x*) ~ -, 
n 

where the notation /i* is used for the smallest root of the derivative. 

3.2. More precise locations of roots of the derivative, given 
that the smallest root of the polynomial is fixed. In the previous 
section we have noted that if the roots of p(x) are distributed uniformly 
in [0, 1], then so are the roots of p'(x). In order to understand better 
the distribution of the roots of p'(x), first let 

p(x) = (x- x x ) . . . (x - x n ), 

then we can write 



1 

p'(x) = p{x) — 



,/=i X X i 



In the generic case where p(x) has no multiple roots, a root /i of p'{x) 
satisfies the equation 

n 

(27) = °" 

Xj fi 

This was interpreted by Gauss (in the more general context of complex 
roots) as saying that \x is in equilibrium in a force field where force is 
proportional to the inverse of distance, and the "masses" are at the 
points x\,...,x n . Gauss used this simple observation to deduce the 
Gauss-Lucas theorem to the effect that the zeros of the derivative lie 
in the convex hull of the zeros of the polynomial (see ||Mardenl966|| ). 



We will use it to get more precise location information on the zeros. In 
particular, consider the smallest root /x* of p'(x). It is attracted from 
the left only by the root X\ of p, and from the right by all the other 
roots, so we see 

Lemma 3.1. For all 2 < i < n, (mi + m.i)/2 > /i*, with equality if 
and only if n = 2. 

Remark 3.2. In the sequel, we shall assume that the smallest root of 
p equals zero (i.e. mi = 0; for simplicity of notation we assume that 
x n = 0). 
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Inequalities fl25|) still give a good estimate for the roots // 2 , • • • , /V-i- 
One can similarly show that 

(28) — - < E(//i) < -, 2<z<n-l. 



However, for /ii = /i*, inequalities (p5|) give < /i* < m 2 . For uni- 
formly distributed Xi this tells us that /z* decays like 1/n or faster. We 
would like to obtain a more precise estimate for the large n behavior 

of /A*. 

The random polynomial, p(x), now has the form 



n— 1 n— 1 

i=0 i=0 



p(x) = X Y\ (x — X{) = X Cj 



We need to estimate the smallest root of p', /i*. 
Theorem 3.3. 

Proof. The smallest root /i* satisfies the equation 

rt-l 

(29) 1/^ = ^1/(^-/1*). 



By Lemma 3.1 



(30) — < Xi — /i* < Xi, 1 < % < n — 1. 

The result follows immediately from equations (EPf) and (ISO). □ 



Now it is clear that in order to find an estimate for /z*, we need to 
study the behavior of (Y^=i V x «) • m terms of the harmonic mean 
of independent random variables xi, . . . , x n -i, we have 

(31) —, 7T#n-i < ^* < rHn-i, 

2{n — 1) n — 1 

or, for large values of n, 

(32) E(//„) ~ -E(H n ). 

n 



Using Theorems |L2] and |1.4j , we readily obtain 

. , 1 
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(33) Ti 



4. A CLASS OF STOCHASTIC MATRICES 

Let T be an n by n matrix constructed as follows: 

;i-Oi)/(n-l), i^j, 
Mi i = 3i 

where the numbers cij are independently distributed between and 1. 
We want to study the large n behavior of the second largest eigenvalue 
of T (the largest eigenvalue is equal to 1). We will denote this eigen- 
value as A*. In the next section we will provide some motivation for 
this choice of stochastic matrices. 

4.1. Motivation: the memory less learner algorithm. The fol- 
lowing is a typical learning theory setup (see | Niyogil998 |). We have 



n sets (which we can think of as concepts), Ri, . . . ,R n . Each set Ri 
is equipped with a probability measure The similarity matrix A is 
defined by a^- = Ui(Rj). Since the Ui are probability measures, we see 
that < dij < 1 and a„ = 1 for all Now, the teacher generates a 
sequence of N examples referring to a single concept Rk, and the task 
of the student is to guess the k (i.e. to learn the concept Gk), hopefully 
with high confidence. 

The learner has a number of algorithms available to him. For in- 
stance, the student may decide in advance that the concept being ex- 
plained is Ri, and ignore the teacher's input, insisting forever more that 
the concept is R\. While this algorithm occasionally results in spec- 
tacular success, the probability of this is independent of the number of 
examples, and is inversely proportional to the number n of available 
concepts. Here we will consider a more practical and mathematically 
interesting algorithm, namely, 

Memoryless learner algorithm. The student picks his initial guess 
at random, as before. However, now he evaluates the teacher's exam- 
ples, and if the current guess is incorrect (i.e. if the teacher's example is 
inconsistent with the current guess), he switches his guess at random. 
The name of the algorithm stems from the fact that the student keeps 
no memory of the history of his guesses, and will occasionally switch 
his guess to one previously rejected. 

It is clear that with the memoryless learner algorithm, the student 
will never be able to learn the set Rk if Rk C Ri- We call such a 
situation unlearnable, and do not consider it in the sequel. In terms 
of the similarity matrix, this can be rephrased as the assumption that 
Oij < 1, i^j. 

To define our mathematical model further, we will assume that the 
student picks the initial guess uniformly: p(°) = (1/n, . . . , l/n) T . The 



HARMONIC MEAN, ETC. 



19 



discrete time evolution of the vector is a Markov process with 
transition matrix T^ k \ which depends on the teacher's concept, R k , 
and the similarity matrix, A. That is: 

/ 34 x T (k) = f (! - a ki )/(n- 1), i^j, 

ij \ a k i, i = j. 

After N examples, the probability that the student believes that the 
correct concept is Rj is given by the jth component of the vector 
(p(AO)T = ( p (o))^(T( fc )) Ar . In particular, the probability that the stu- 
dent's belief corresponds to reality (that is, j = k) is given by: 

(35) Qkk(N) = [(p^) T (T^) N ] k . 

It is clear that the dynamics of the memoryless learner algorithm is 
completely encoded by the matrix T defined above by (|3"4"D. 

We are interested in the rate of convergence as a function of n, the 
number of possible concepts. We define the convergence rate of the 
algorithm as the rate of the convergence to of the difference 

l-Qkk(N). 

In order to simplify notation, let us set k = 1 and skip the correspond- 
ing subscript /superscript. In order to evaluate the convergence rate of 
the memoryless learner algorithm, let us represent the matrix = T 
as follows: 

(36) T = VAW, 

where the diagonal matrix A consists of the eigenvalues of T, which we 
call A,, 1 < i < n; representation (^) is possible in the generic case. 
The columns of the matrix V are the right eigenvectors of T, v^. The 
rows of the matrix W are the left eigenvectors of T, Wj, normalized 
to satisfy < Wj,v, >= 5ij, where 8ij is the Kronecker symbol (so that 
VW = WV = I). The eigenvalues of T satisfy |A,,| < 1. We have 

T N = VA N W. 

Let us arrange the eigenvalues in decreasing order, so that Ai = 1 and 
A2 = A* is the second largest eigenvalue (we assume that it has mul- 
tiplicity one). If iV is large, we have A^ for all i > 3, so only 
the first two largest eigenvalues need to be taken into account. This 
means that in order to evaluate T N we only need the following eigen- 
vectors: vi = (1/n, 1/n, . . . , l/n) T , v 2 , Wi = (n, 0, . . . , 0), and w 2 (it 
is possible to check that the contribution from the other components 
contains multipliers A^ with i > 2 and thus can be neglected, see the 
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computation for C n in Section |4~3| ). It follows that 



(37) 
where 

(38) 



Q 



li 



1 n 

Cn = --^[ V 2]j[ W 2]l- 



The convergence rate of the memoryless learner algorithm can be found 
by estimating A* and C n . It turns out that a good understanding of A* 
(Section fL2| ), helps us also estimate C n (this is done in Section f4.3|) . 



4.2. Second largest eigenvalue and the smallest root of the 
derivative of a random polynomial. Let Z = I — T, and let Xi = 

1 — a.;. The matrix Z satisfies Za = Xi, while Zij = — for j ^ i. 
We have 

n 1 
(39) Z = -D X (I 



' J n 



n — 1 n 

where J n is the n x n matrix of all ones, and D x is a diagonal matrix 
whose i-th element is Xj. It is convenient to introduce the matrices 



/ 



(40) 

and 
(41) 



I~-Jn 

n 



1 


-1/n 


-1/n 


1/n 


1 


-1/n 


1/n 


-1/n 


1 



\ 



B 



D x M n . 



The second largest eigenvalue of T, which we denote as A*, and the 
smallest nontrivial eigenvalue of B, /ji* , are related as 

(42) A, = 1 

n — 1 

In what follows we will write down the characteristic polynomial of 
B. Let us recall the following 

Fact 4.1. Let A be an n x n matrix. Then the coefficient of x n ~ k in 
the characteristic polynomial pa(x) of A (defined to be det(xl n — A)) 
is given by 

E (-l) fc detm 5 , 

fc-element subsets 
5 of {l,...,n} 

where m§ is the matrix obtained from A by deleting those rows and 
columns whose indices are not elements of S (we call nis the minor of 
A corresponding to S). 
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We will need the following lemmas: 

Lemma 4.2. Let A be an n x n matrix, and let D x be as above. Let 
m h,...,i k be the ii, . . . ,i k minor of M (that is, the sub-matrix o/i 1; . . . , i& 
rows andii, . . . ,i k columns of M). Then the minor 7^ i fc of the matrix 
C = D X A satisfies 

k 

det 7^,...,^ = detm ilt ... tik Y[x ir 

1=1 

Proof. This is immediate, since the j-th row of C is Xj times the j-th 
row of A. □ 



Lemma 4.3. The characteristic polynomial of M n = I — ^J n equals 
x(l — x) n ~ l . 

Proof. Immediate, since the bottom eigenvalue of M n (given in equa- 
tion fl4"0|)) is zero and the rest are 1. □ 



Lemma 4.4. All the k x k minors of M n (defined in the statement of 
Lemma are equal. 

Proof. By inspection - all the k x k minors of M n are M k + ( - — -r) J k . 

□ 



Lemma 4.5. The determinants d k of the kxk minors of M n are equal 
to * 

n 

Proof. We know that the (?)dfc = ClZiji from Lemmas O and O . 
From this the assertion follows immediately. □ 

Now, let 

n-1 

p D (x) =x n + ^c i x i 

i=0 

be the characteristic polynomial of D x . 

Lemma 4.6. The characteristic polynomial of B, Pb(x), is given by: 

n-1 . 

(43) p B (x)=x n + Y j -c i x\ 

where c; are as above. 



n 

i=0 
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Proof. iFrom Lemma [4.2| combined with Lemma [L|, we see that the 
coefficient of x l in ps{x) is given by 

~n E II 

j-element subsets .7 So 
S of {l,...,n} 

The sum is just the i-th elementary symmetric function of the Xx, . . . , x n , 
which is equal to q. The assertion follows. □ 

Notice that the constant term of ps vanishes, so we can write 

p B (x) = xq(x), 

where 

n ~ 2 i + 1 
q(x) = x n ~ l + V ^- 



n 

i=0 



But obviously 

g(ac) 

so we have 



1 dp D (x) 
n dx 



Lemma 4.7. The characteristic polynomials of the matrix B defined 
in (TJ7[) and the diagonal matrix D x with elements Xi are related by 

x 

Pb{x) = -p' D {x). 
n 

This relates the eigenvalues of the matrix B and the zeros of the 
polynomial q(x) (andp^(x)). In its turn, the smallest eigenvalue of B 
is related to the second largest eigenvalue of our matrix T by equation 



We can see that studying the second largest eigenvalue of a stochastic 
matrix of class (^) is reduced to the problem of the smallest root of 
the derivative of the stochastic polynomial of class (0), with Xi = 
1 — a.;. Note that by the definition of matrix T^ k \ one of the quantities 
1 — a ki — Xi is equal to zero. This means that in order to find the 
distribution of the second largest eigenvalue of such a matrix, we need 
to refer to Section |3.2|, i.e. the case where one of the roots of the random 
polynomial was fixed to zero, and the rest were distributed uniformly. 

4.3. Eigenvectors of stochastic matrices. Next, let us study eigen- 
vectors of stochastic matrices, and derive an estimate for C n in equation 
(|38|). Consider the matrix Z defined in equation (|39|). We can write 



Z = W D^V, where V and W are the matrices of right and left eigen- 
vectors (respectively) of Z, and is a diagonal matrix whose entries 
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are the eigenvalues of Z. We know that the right eigenvector of Z 
corresponding to the eigenvalue is the vector vi = (1, . . . , 1) T , while 
the left eigenvector is the vector wi = (1, 0, . . . , 0). To write down the 
eigenvector v* (i > 1) we write v« = vj + Uj, where < vi, Uj >= - we 
can always normalize Vj so that this is possible. If the corresponding 
eigenvalue is we write the eigenvalue equation: 

Tl I Tl 

HiVi = Zvi = ~D X (I J„)(vi + Ui) = -D x Ui. 

n — 1 n n — 1 

This results in the following equations for Uy - the j-th coordinate of 
Ui (for j > 1; Un = -1): 

(AA\ U 

(44 J j^i -\- fXitiij —Xjiiijj 

and so 

(45) u l3 = " . 

On the other hand, the eigenvalue equation for Wj is 

, n 1 

(46) HiWi = Z Wi = -(J J n )D x Wi, 

n — 1 n 

resulting in the following equations for the coordinates: 

(47) /ijWjj = — ^— [ Sj-Wy V] . 

n — 1 \ n 

\ fe=i 

If we assume that x\ = 0, then setting j = 1, we get 

n 1 \ 

(48) fJ>iWa = ) . 

n — In ' 



XkWik , 

/,• 1 

and so 

n 



(49) ^ 



n 

= TfJ-iWn, 

n z — 4 n — 1 

fc=i 



and equation fl4?|) can be rewritten (for j > 1) as 

Tl I Tl 
(50) jUiWi,- = H rAijWii 

n — 1 \ n — 1 

to get 



n 



, r| , 

Now, let us assume that % = 2, and in addition /i 2 <C x^, > 1. While 
it follows immediately from Lemma |37T| that fi2 < %2, we comment that 
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by our Weak Law of Large Numbers (Theorem |1.4j) , the probability 
that \ii > cn/ log n goes to with c, whereas the probability that 
\xk — (k — l)/n| > c 2 /n goes to zero with c 2 (detailed results on the 
distribution of order statistics can be found in ||FellerV2| , Chapter I]). 

Remark 4.8. The assumption that fi* <C x 2 is least justified if we have 
reason to believe that x 2 <C 1/n. 

Thus we can write approximately: 

(52) v 2j « ^ + 1, 



x j 



while 

(53) w 2j 



Xj 



Since we must have < w 2 ,v 2 >= 1, we have: 

n 

— — + 1 

which implies that 



=2 J 3 



1 1 

W21 f 



1 ' 



^2 Ej=2 Xj 

which, in turn, implies that 1/2 < \w2i\ < 1. This means that we have 
the following estimate for the quantity C n in (|38f) : 



1 " 

1 < — W21 y^v 2j < 2, 



n 

i=2 

i.e. 

(54) 1 < C n < 2. 

4.4. Convergence of the memoryless learner algorithm. Let us 

assume that the overlaps between concepts, = in the matrix 
T, are independent random variables distributed according to density 
f(a). Then the variables X{ = 1—aj have the probability density f(x) = 
/(l — x). Our results for the rate of convergence of the memoryless 
learner algorithm can be summarized in the following 

Theorem 4.9. Let us assume that the density of overlaps, f(x), ap- 
proaches a nonzero constant as x — > 0. Then in order for the learner 
to pick up the correct set with probability 1 — A, we need to have at 
least 

(55) iV A ~ I log A I (n log n). 
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sampling events. 

Proof. Combining equations ([37]), (|4j) and ( f42[) we can see that in order 
for the learner to pick up the correct set with probability 1 — A, we 
need to have at least 



N A ~ | log A|//2, 
Since (3 = (see equation (|21"|), we have a 



(56) 

sampling events. Since (3 = (see equation (|2I|), we have a = 1. 
Using bounds (|3l"D which relate /i* to the harmonic mean, and the weak 
law of large numbers (Theorem ( |1.4| )), we obtain estimate flo*op. This 
estimate should be understood in the following sense: as n — > oo, the 
probability that the ratio /z" 1 / '(nlogn) deviates from 1 by a constant 
amount, tends to zero. Therefore, the right hand side of (|56|) behaves 
like the right hand side of (|55|) with probability which tends to one as 
n tends to infinity. □ 

For other distributions we have 

Theorem 4.10. If the probability density of overlaps, f(x), is asymp- 
totic to x^ + 0((a;) /3+<5 ), 5,(3 > 0, as x approaches 0, then 

N A ~ | log A|n; 

if -1 < (3 < 0, then 



i^o P la; < I log AlraVCi+z?) 



< x 



1. 



Proof. The proof uses the results on the harmonic mean in section 
i □ 
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Appendix A. Derivation of the stable law for the 

UNIFORM DISTRIBUTION OF Xi 

Here we will provide an explicit derivation of the stable law, equa- 
tion (|T(]D, in the case where the random variables x% are uniformly 
distributed between zero and one. The characteristic function corre- 
sponding to the distribution of yi = 1/xi, equation (||), is given by 

poo iky 

(57) X (k) = / —dy. 

Ji y 

This can be evaluated explicitly; here we only present the computations 
for positive k, to avoid clutter. We have 

(58) x(k) = -\k\n/2 + cos k + kSi (k) + i{sm k - kCi (\k\)\, 

where Si and Ci are sin- and cos- integrals, respectively (this expression 
is obtained with Mathematica) . The behavior of x{k) at can be easily 
computed using the above formula: 

(59) x {k) = 1 - i(7 - l)k -^k-ik log k + ^k 2 + o(k 2 ), 

where 7 ~ 0.577216 is Euler's constant. 

We can also obtain the asymptotics for x(k) directly, as follows. 
First, we change variables, and set u = ky, to obtain 

00 giu 



Let 



X(k) = k I -^du. 



00 gjit rR ^iu 

= I —z-du and Inik) = \ —^du. 



Clearly, I(k) = lim^oo In(k). Since the integrand (call it f(u)) has 
no poles, except at 0, we see that Ir(A;) + J(k) + L(R) — Kn(k) = 0, 
where J(k) is the integral of f(u) along the positive quadrant of the 
circle \z\ = k, L(R) the integral along the positive quadrant of he circle 
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\z\ = R, and K^{k) is the integral along the imaginary axis, from ik 
to iR. It is easy to see that lim^oo L(R) = 0, while 

kJ(k) = - exp(ike ie - i26)d6, 
2 Jo 

the Taylor series of which is easily evaluated by expanding the inte- 
grand in a Taylor series. The integral K^{k) reduces to the integral of 
exp(— u)/u 2 , the asymptotics of which can be easily obtain by repeated 
integration by parts. 

The characteristic function of Y n = a n S n — b n , ty n , can be obtained 
by setting a n = 1/n, and b n = logn, and using equation @: 

(60) 

^ n (k) = exp(-iklogn)(x(k/n)) n 

' k 2 log 2 n 



exp (- (~ + 1(7 - 1)) k + ik log k^j (^1 + 0- 



n 



where we expanded the exponential in its Taylor series. The expression 
for g (equation (0)) follows as we take the limit n —>■ oo and perform 
the Fourier transform of (|60"D. 



Appendix B. Rate of convergence to stable law 

Since some of the quantities we are trying to estimate depend on n, 
it is not enough for us to know that the distributions of the quantities 
X n converge to a stable law, but it is necessary to have good estimates 
on the speed of convergence (cf. Section |TT3|)f|. Our results can be 
summarized as follows: 

Theorem B.l. In the case a = 1, we have g n (x) = g (x) + O (log 2 n/n). 

Proof. The proof falls naturally in two parts, both of which require 
estimates on the characteristic function of G n . First, we show that we 
can throw away the tails of characteristic function, and then we esti- 
mate convergence in the remaining region. We introduce the following 
notation: let ^ n (k) be the characteristic function of G n (k), and let 



2 Such a result is claimed in | KK2001 |, where the authors find an estimate g n (x) — 
g(x)(l + 0(l/n)), where g n is the density of X n , while g is the stable density, 
and the implicit constants are uniform in x. Such an estimate would be good 
enough for our purposes, unfortunately it is incorrect, as can be seen by noting 



that g n (x) = 0, x < 1 — logn. A correct estimate is given by [Halll981|, however 
the method we outline is self-contained, simple, and generalizable, so we choose to 
present it here. 
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be defined as 

i r nl ~' 

9 S n (x) = 7T exp(-ixk)^ n (k)dk. 
^ J-n 1 - 1 

In addition, let R n (x) = \g n (x) — g„(x)\. Then 
Lemma B.2. 

sup R n {x) = 0(ne~^ nl ' S ). 

x£(— 00,00) 



Lemma |B.2| will be proved in Section |B.l 
We will also need the following 

Lemma B.3. Let ^>(k) be the characteristic function of the stable den- 
sity, g, and 

R(x) = — ( I e~ lkx ^{k) dk+ [ e~ ikx ^{k) dk] . 



M 



Then R(x) = 0(exp(-vrM/2)). 

Proof. Let us recall that ^(fc)] = exp(— f|fc|). Now we write 



-M 



e 2 



dk 



e 2 



^dk 



M 



re 2 



M 



7T- 1 



□ 



To end the proof of Theorem |B.1| we need to estimate how closely 
ty(k) is approximated by \P„(A;), for k < n l ~ 5 , since by the above 
estimates we know that 

(61) g(x) - gJx) = - (co8(xk)C9(k) - ^ „,(£:)) dk 

Jo 

(plus lower order terms). It remains only to estimate the difference 
\^{k) — ^(A;)!- From equation floTf ) we have 



g(x) - g n (x) <0 (log 1 n/n) I e z k k 2 dk = 0(log 2 n/n). 

'0 



□ 



B.l. An estimate on the tails of the characteristic function of 

Y n . In this section we will supply the proof of Lemma |B.2| we use 
the notation introduced before the statement of the lemma. We write 
explicitly 

/•OO 

R n ( y ) =sft / e- lk{y+logn) X n (k/n)dk. 



HARMONIC MEAN, ETC. 



2!) 



The argument given below can be easily generalized for any stable law. 
The only facts about the function x{k) th & t we are going to use are as 
follows: 

i) \x(k)\ < 1 - c\kf + o(k), k < 1, for some (3 > 0; 

ii) |x(&)| ^ Ci < 1; for all k > ko- This holds whenever \ is not the 
characteristic function of a lattice distribution, essentially by the 
Riemann-Lebesgue lemma; 

iii) X e L 1 . 

Given condition (iii) we have 
Lemma B.4. There exists an M such that 

"OO -I POO 

\ X (z)\ n dz<-2 \ X (z)\dz. 

M z JM 

Proof. By the Riemann-Lebesgue Lemma, 3M 2 , such that \x( z )\ < 1/2) 
for all z > M2. Setting M = M2, the assertion of the lemma follows 
immediately. □ 

Let us introduce the variable z = k/n. We have 

POO 

R n (y) =U$l e -inz(y+\o g n) x n^ dz 

Next, we note that 



POO 

\R n (y)\ < R n = n \x n (z)\dz 



where the last inequality holds for 5 < 1 and n sufficiently large. Let 
us write 



Rn — Rn,l + Rn,2 + Rn. 



3- 



where 



Rn,i = n / \x(z)\ n dz, 

Jn~ s 

R n ,2 = n I \x(z)\ n dz, 

J 2n 



20 
00 



Rn,3 = n / \ X (z)\ n dz. 



21 



The constant z is chosen so that \x( z ) \ is decreasing from to z . Such 
a zo > can always be found as long as x( z ) is continuous at 0, since 
x(0) = 1 and \x( z )\ < 1 for all z sufficiently close to (and all z if 
X is not the characteristic function of a lattice distribution). If \x{ z )\ 
is monotonically decreasing always, as seems to be the case with ([58]), 
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we take Zq — 1. We choose z\ in such a way that Lemma [B.4j holds for 
M = z x . 

First we consider R ny \. By the choice of zq, the function \xi z ) \ niono- 
tonically decreases on [n~ s ,z ]. Therefore, 

(62) \R n ,i\<nz \ X (n- s )\ n . 
For small values of k, we have (see property (i)): 

|x(fc)| = 1 - tt/c/2 + 0((£;log£;) 2 ). 

Therefore, 

| X (n- 5 )rxexp(-V- 5 ), 

that is, it decays exponentially for any < 5 < 1. Thus, from ( |62|) we 
see that 

(63) \R n:1 \ < nz exp(--n 1 ' s ). 

Next, we estimate R n ,2- Because of property (ii), we have 

(64) [R^l < n 1+s C?, 

and since C\ < 1, it also decays exponentially with n. Finally, we 
estimate R n $ by Lemma p.4| . Putting everything together we conclude 
that R n decays exponentially with n. 
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